Benchmarking Commercial OCR Engines for Technical Drawings Indexing
نویسندگان
چکیده
The choice of a commercial Optical Character Recognition (OCR) engine is important for the process of automatically indexing technical drawings from their title blocks. We would like to benchmark commercial OCR engines with respect to their inclusion in the global digitalisation chain from scanning to understanding the text information contained in a technical drawing document. The crucial (costly) point is the manual correction of OCR recognition errors. By benchmarking, we intend to identify, for our application domain, the causes for OCR errors which are the most costly to
منابع مشابه
A Retrieval System for Graphical Documents
We present a method for indexing line drawings automatically. The indexing scheme is used for the retrieval of line-drawings in a weighted information retrieval (IR) system. Being content-based, the indexing method depends not only on the graphical structures in the drawings, but on the textual entries as well. No a priori knowledge is used in the indexing scheme, since application-speciic assu...
متن کاملAutomatic Indexing for Storage and Retrieval of Line Drawings
The usefulness of a collection of scanned graphical documents can be measured by the facilities available for their retrieval. We present an approach for indexing a collection of line drawings automatically. The indexing is based on the textual and graphical content of the drawings. This approach has been developed to facilitatèretrieval by example' in heterogeneous collections of graphical doc...
متن کاملTowards content-based retrieval of technical drawings through high-dimensional indexing
This paper presents a new approach to classify, index and retrieve technical drawings by content. Our work uses spatial relationships, visual elements and high-dimensional indexing mechanisms to retrieve complex drawings from CAD databases. This contrasts with conventional approaches which use mostly textual metadata for the same purpose. Creative designers and draftspeople often re–use data fr...
متن کاملContent-Based Image Retrieval Systems: A Survey
In many areas of commerce, government, academia, and hospitals, large collections of digital images are being created. Many of these collections are the product of digitizing existing collections of analogue photographs, diagrams, drawings, paintings, and prints. Usually, the only way of searching these collections was by keyword indexing, or simply by browsing. Digital images databases however...
متن کاملReliable OCR solution for digital content re-mastering
This paper addresses the system’s aspects of OCR solutions in the context of digital content re-mastering. It analyzes the unique requirements and challenges to implement a reliable OCR system in a high-volume and unattended environment. A new reliability metric is proposed and a practical solution based on the combination of multiple commercial OCR engines is introduced. Experimental results s...
متن کامل